EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data
نویسندگان
چکیده
Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It performs an evolutionary prototype reduction focused on providing diverse solutions to prevent the method from overfitting the training set. It also allows us to explicitly reduce the underrepresented class, which the most common preprocessing solutions handling class imbalance usually protect. As part of the experimental study, we show that the proposed prototype reduction method outperforms state-of-theart preprocessing techniques. The preprocessing step yields multiple prototype sets that are later used in an ensemble, performing a weighted voting scheme with the nearest neighbor classifier. EPRENNID is experimentally shown to significantly outperform previous proposals. & 2016 Elsevier B.V. All rights reserved.
منابع مشابه
Prototype reduction techniques: A comparison among different approaches
The main two drawbacks of nearest neighbor based classifiers are: high CPU costs when the number of samples in the training set is high and performance extremely sensitive to outliers. Several attempts of overcoming such drawbacks have been proposed in the pattern recognition field aimed at selecting/gen-erating an adequate subset of prototypes from the training set. The problem addressed in th...
متن کاملDifferential evolution for optimizing the positioning of prototypes in nearest neighbor classification
Nearest neighbor classification is one of the most used and well known methods in data mining. Its simplest version has several drawbacks, such as low efficiency, high storage requirements and sensitivity to noise. Data reduction techniques have been used to alleviate these shortcomings. Among them, prototype selection and generation techniques have been shown to be very effective. Positioning ...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملImproved Fuzzy-Optimally Weighted Nearest Neighbor Strategy to Classify Imbalanced Data
Learning from imbalanced data is one of the burning issues of the era. Traditional classification methods exhibit degradation in their performances while dealing with imbalanced data sets due to skewed distribution of data into classes. Among various suggested solutions, instance based weighted approaches secured the space in such cases. In this paper, we are proposing a new fuzzy weighted near...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neurocomputing
دوره 216 شماره
صفحات -
تاریخ انتشار 2016